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OVERVIEW 


MinlO is a high performance, Amazon S3 compatible, distributed object storage system. By 
following the methods and design philosophy of hyperscale computing providers, MinlO delivers 
high performance and scalability to a wide variety of workloads in the private cloud. Because 
MinlO is purpose-built to serve only objects, a single-layer architecture achieves all of the 
necessary functionality without compromise. The advantage of this design is an object server 
that is simultaneously performant and lightweight. 


Splunk is a high performance event processing platform for enterprise computing environments 
that provides critical and timely insight into IT operations, including data from loT, firewalls, web 
servers and more. SmartStore feature enables Splunk to offload Petabytes of data to an 
external Amazon S3 compatible object storage. Disaggregating compute and storage frees 
Splunk nodes to focus on indexing and search, while the object storage is free to focus on the 
management, resilience and security of the data. 


This whitepaper describes integration and performance testing between Splunk's SmartStore 
functionality and MinlO object storage. The results show that the combination of MinlO and 
Splunk SmartStore provide a high performance, on premise S3 store that allows for complete 
control over an enterprise's event data. With the combination of Splunk's efficiency in search and 
compression, and MinlO's ability to quickly store and retrieve data, the results of the “worst case 
scenario” (searching against terabytes of events across 10 servers) is returned in seconds. 


AUDIENCE 


This paper is intended for IT and Security professionals who have experience in setting up Splunk 
and basic understanding of MinlO. This paper assumes high level understanding of the 
technologies described. 


DEFINITION OF TERMS 


In this document the following terms are used. They are specific to either Splunk, MinlO, or 
object storage as a whole. 


S3 - Simple storage service, a cloud based object storage system from Amazon. MinlO is 
a drop in replacement for Amazon S3 for Splunk's SmartStore. 


Indexer - A Splunk node dedicated to collating events into actionable data. 


Indexer cluster - A group of Splunk nodes also referred to as Peer nodes that, working in 
concert, provide a redundant indexing and searching capability. 


Cluster Master - A Splunk node dedicated for the purpose of managing Splunk clusters. 


Search Head - A Splunk node used to guery multiple indexers at once. In this document, a 
single search head is used to guery all indexers in a cluster simultaneously. 
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Bucket - In the context of Splunk, a bucket represents a folder comprised of a time defined 
collection of events. In the context of MinlO, a bucket represents a logical separation of 
data. When Splunk creates buckets in MinlO, it uses the same naming convention as used 
to create buckets on the indexer nodes, providing a one to one mapping of data from the 
filesystem to the backend object store. 


Erasure code - a mathematical algorithm to reconstruct missing or corrupted data. This 
provides the data resiliency in a smaller footprint than normally reguired from data replication. 


ENVIRONMENT 


Physical instances were deployed via Packet.net. For Splunk indexers and MinlO nodes, m2-xlarge 
instances were deployed providing 28 cores per server, 384 GB of RAM and 4TB of usable disk 
space. As shown later in the document, MinlO only requires moderate amounts of RAM and 
CPU. However, these instances provided fast storage via NVMe drives with a theoretical 
maximum of 2.6GB/s bandwidth. Network throughput as measured by iperf is ~ 20Gb/s. Disks 
were formatted with XFS using default values. 


Both the MinlO cluster and the Splunk indexer cluster had 10 nodes each. 


The Splunk Search Head and Cluster Master were both provisioned with smaller c2. medium 
instances as they do not have the same requirements as the clustered nodes, but still have the 
important 20Gb/s network bandwidth to ensure the search head is not bottlenecked on network. 


All nodes are provisioned with Ubuntu 18.04.2 LTS as the OS. Splunk nodes were installed with 
Splunk Enterprise version 7.3.0. The Splunk index cluster was configured with a replication factor 
of 3 and a search factor of 2. 


Blank Instance Types # Nodes Replication 
MinlO Nodes F 10 Erasure Code :2 
m2.xlarge: 
- 28 cores 


3x Replication 


Splunk Indexers - 384 GB RAM 10 
2x Search 


- 4TB total disk 
Splunk Search Head i 
c2.medium: 


Splunk Cluster Master -20 GB/s NIC 
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CONFIGURING MINIO 


MinlO was configured as a systemd service in a ten node cluster, where each node in the MinlO 
cluster had a single, dedicated disk. The erasure code ratio was set to EC:2, allowing for up to 
two disks to be lost and still retain data. This was chosen to accommodate having sufficient 
storage for testing. Based on the performance results listed in this paper, having a higher 
erasure count would only have had a negligible effect. 


MINIO_VOLUMES="http://10.88.126.{21...30}:9000/opt/minio" 
MINIO STORAGE CLASS STANDARD="EC:2" 

F Access Key of the server. 

MINIO ACCESS KEY=<your access key> 

# Secret key of the server. 

MINIO_SECRET_KEY=<your secret key> 


CONFIGURING SMART STORE 


SmartStore configuration is simple and requires just a few additional lines to the existing 
indexes.conf file: 


[volume: s3] 

storageType = remote 

path = s3://smartstore/remote_volume 
remote.s3.access_key = minio 
remote.s3.secret_key = minio123 
remote.s3.supports versioning = false 
remote.s3.endpoint = http://miniocluster:9000 
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The path parameter locates the bucket that will hold the data created in this test. On the MinlO 
service, a bucket named "smartstore" was created for this purpose. It is mandatory to create the 
bucket configured for SmartStore before applying the new indexes.conf settings. 


The remote.s3.access key and remote.s3.secret key are configured as the MinlO instance keys in 
the section above. Since MinlO does not support versioning, the remote.s3.supports versioning 
parameter is set to false. The remote.s3.endpoint locates the the MinlO cluster where the 
smartstore bucket is created. 


For distributed environments, it is recommended to have the endpoint configured to the round 
robin DNS entry for the cluster. For this test, rr DNS was simulated with an /etc/hosts entry on 
each indexer pointing to a single MinlO node, resulting in a 1:1 mapping for client (Splunk Indexer) 
and server (MinlO node). 


From the Splunk cluster master, add these changes to 

$SPLUNK PATH/etc/master-apps/ cluster/local/indexes.conf, for example, 
/opt/splunk/etc/master-apps/ cluster/local./indexes.conf. After modifying the file, changes 
can be pushed to all indexer nodes: 


root@splunk-cluster-master:~+# splunk apply cluster-bundle --answer-yes 
Created new bundle with checksum=4BD58C40847DC1F5173BD8FDED6903F3 

Applying new bundle. The peers may restart depending on the configurations in 
applied bundle. 

Please run 'splunk show cluster-bundle-status' for checking the status of the 
applied bundle. 


After configuring MinlO in Splunk, Splunk internal buckets (such as _audit and _telemetry) are 
automatically created. This can be checked via the MinlO client (mc): 


mc ls miniocluster/smartstore/remote volume/ 


[2019-10-01 11:01:43 PDT] OB _audit/ 
[2019-10-01 11:01:43 PDT] OB _internal/ 
[2019-10-01 11:01:43 PDT] OB _introspection/ 
[2019-10-01 11:01:43 PDT] OB _telemetry/ 


After the environment has reached the threshold of rolling hot buckets, index buckets will be 
rolled to MinlO as well: 
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mc ls miniocluster/smartstore/remote volume/ 


[2019-10-01 
[2019-10-01 
[2019-10-01 
[2019-10-01 
[2019-10-01 


14:06:13 PDT] 
14:06:13 PDT] 
14:06:13 PDT] 
14:06:13 PDT] 
14:06:13 PDT] 


OB audit/ 

OB _internal/ 

OB _introspection/ 
OB telemetry/ 

OB minio index/ 


Additionally, activity can be observed from the Splunk Monitoring console under Indexing -> 


SmartStore -> Activity and Cache Performance 


ONLINE 





Portion: 2 spent downloading buckets from remote storage, 
Cacho Hite/Mizsee 


LOADING DATA 





HTH i me. 


Loading data was achieved using the gogen event generating utility to load data into Splunk. 


Initial load was 100GB per day per node for the previous ten days: 


root@splunk-cluster-master:.# ./gogen -c examples/weblog/weblog da- 
ta.yml -o file -f /opt/splunk/generated data/minio index/min- 
io events. log -g 12 gen -c 3551 -b="-10d" 


MinlO for SmartStore | 06 


This generated a total of 10TB of data across all nodes for a total of - 3 billion events per day 
per node. The generated data is of weblog format. Sample events have the format: 


27.35.11.11 - - [18/Sep/2019: 23:59:59 +0000] "GET /prod- 
uct.screen?product_id=Ho lyGouda&JSESSIONID=SD3SL1FF7ADFF8 HTTP/1.1" 
200 2243 "http://shop.buttercupgames.com/cart.do?ac- 
tion=view&itemId=HolyGouda" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 
5 0 1 like Mac OS X; en_US) AppleWebKit (KHTML, like Gecko) Mobile 
[FBAN/FBForIPhone; FBAV/4.0.2; FBBV/4020.0; FBDV/iPhone3, 1; FBM-— 
D/iPhone;FBSN/iPhone 0S;FBSV/5.0.1;FBSS/2; FBCR/AT&T; FBID/phone; F- 
BLC/en_US; FBSF/2.0]" 


The MinlO command line client mc has an inbuilt function called ‘trace’ that allows viewing of all 
S3 requests that come to the server: 


root@splunk-cluster-master:~# mc admin trace miniocluster 


21:27:21.309 [200 OK] s3.PutObject miniocluster:9000/smartstore/re- 
mote volume/minio index/d- 
b/21/70/482FF906DA0-5A09—4A5F-ABDF-3F64BFF58C64/receipt.json 
20.39ms A 1.8 KiB + 261 B 


21:27:22.726 [200 OK] s3.NewMultipartUpload miniocluster:9000/smart- 
store/remote volume/minio index/d- 
b/86/cd/486~7753C048-D6AB-45 79-85 7E-790ECB0@3D107/guidSp Lunk-7753C048-D6 
AB-4579-857E-790ECBQ3D107/ rawdata/journal.gz?up loads 1.031492s 

A 69 B % 637 B 


21:27:24.822 [200 OK] s3.PutObjectPart miniocluster:9000/smartstore/re- 
mote volume/minio index/d- 
b/86/cd/486~7753C048-D6AB-45 79-85 7E-790ECB03D107/guidSp Lunk-7753C048-D6 
AB-4579-857E-790ECB03D107/ rawdata/journal.gz?partNumber=38up LoadId=aa0d 
8aeb-1caf-46b0-—8843-d1c65a773270 573.302ms A 79 MiB + 261 B 
21:27:24.822 [200 OK] s3.PutObjectPart miniocluster:9000/smartstore/re- 
mote volume/minio index/d- 
b/86/cd/486~7753C048-D6AB-—45 79-85 7E-790ECB03D107/guidSp Lunk-7753C048-D6 
AB-4579-857E-790ECB03D107/ rawdata/journal.gz?partNumber=2é8up lLoadId=aa0d 
8aeb-1caf-46b0-8843-d1c65a773270 861.608ms 4 128 MiB v 261 B 


21:27:24.822 [200 OK] s3.PutObjectPart miniocluster:9000/smartstore/re- 
mote volume/minio index/d- 
b/86/cd/486~7753C048-D6AB-45 79-85 7E-790ECB03D107/guidSp Lunk-7753C048-D6 
AB-4579-857E-790ECB03D107/ rawdata/journal.gz?partNumber=16up LoadId=aa0d 
8aeb-1caf-46b0-8843-d1c65a773270 1.287407s 4 128 MiB + 261 B 
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To see only API reguests, run: 


mc admin trace --json miniocluster | jg 'fpath, "API": .api}' 
root@spLunk-indexer01:~# mc admin trace miniocluster/ --json | jq 'fpath, 
"API": .api}' 
{ 

"path": "/smartstore/remote volume/minio index/d- 


b/62/89/50747753C048-D6AB-4579—857E-790ECB03D107/guidSplunk-7753C048-D6AB-4 
579-857E-790ECB03D107/rawdata/journal.gz", 

"API": "s3.NewMultiPartUpload" 

} 
{ 

"path": "/smartstore/remote_volume/minio_index/d- 
b/62/89/507~7753C048-D6AB-4579-857E-790ECBQ3D107 /guidSplunk-7753C048-D6AB-4 
579-857E-790ECB03D107/rawdata/journal.gz", 

"API": "s3.PutObjectPart" 

} 
{ 

"path": "/smartstore/remote_volume/minio_index/d- 
b/62/89/507~7753C048-D6AB-45 79-85 7E-790ECBQ3D107/guidSplLunk-7753C048-D6AB-4 
579-857E-790ECB03D107/rawdata/journal.gz", 

"API": "s3.PutObjectPart" 

} 
{ 

"path": "/smartstore/remote volume/minio index/d- 
b/62/89/507-7753C048-D6AB-4579—857E-790ECB03D107/guidSplunk-7753C048-D6AB-4 
579-857E-790ECB03D107/rawdata/journal.gz", 

"API": "s3.PutObjectPart" 

} 
{ 

"path": "/smartstore/remote volume/minio index/d- 
b/62/89/507~7753C048-D6AB-4579-857E-7 90ECBQ3D107 /guidSpLunk-7753C048-D6AB-4 
579-857E-790ECB03D107/rawdata/journal.gz", 

"API": "s3.CompleteMultiPartUpload" 

} 
{ 

"path": "/smartstore/remote_volume/minio_index/d- 
b/62/89/507~7753C048-D6AB-4579-857E-7 90ECBQ3D107 /guidSpLunk-7753C048-D6AB-4 
579-857E-790ECB03D107/rawdata/slicesv2.dat", 

"API": "s3.PutObjectPart" 

} 
{ 

"path": "/smartstore/remote_volume/minio_index/d- 
b/62/89/507~7753C048-D6AB-4579-857E-7 90ECB03D107/guidSpLunk-7753C048-D6AB-4 
579-857E-790ECB03D107/rawdata/slicesmin.dat", 

"API": "s3.PutObjectPart" 

} 
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Finally, for detailed information, use the -v switch for verbose mode 


rootesplunk-indexer0l:v4 mc admin trace -v miniocluster 
10.88.126.47 [REOUEST s3.GetObject] 22:58:18.809 
10.88.126.47 POST /smartstore/remote volume/minio index/d- 
b/01/4a/2«356932B6-38BC-4A6D-877C-A767DB61560A/guidSplunk-356932B6-38BC-4A6 
D-877C-A767DB61560A/1568660462-1568324333-12826708760433608263.tsidx 

10.88.126.47 Host: miniocluster:9000 
10.88.126.47 Authorization: AWS4-HMAC-SHA256 Credential=min- 
i0/20191008//s3/aws4 reguest, SignedHeaders=host; range; x-amz-con- 
tent-sha256; x-amz-date, Signa- 
ture=c8fb025e3958d72e00b5872dbb0819196be1lc11c71196a25cc8454accc02ba96 
Content-Length: 0 
Range: bytes=134217728-268435455 
Te: trailers, chunked 
X-Amz-Content-Sha256: UNSIGNED-PAYLOAD 
X-Amz-Date: 20191008T225818Z 
<BODY> 
[RESPONSE] [22:58:19.380] [ Duration 571.316ms 


10. 
10. 
10. 
10. 
10. 
10. 
10. 


MiB ] 


10 


„88 


88. 
88. 
88. 
88. 
88. 
88. 
88. 


126.47 
126.47 
126.47 
126.47 
126.47 
126.47 
126.47 


„126.47 


206 Partial Content 


A 75 B v 128 


In addition to the regular events, one special event per minute was inserted with the search 


term "manticore“": 


[28/Sep/2019: 23:59:21 +0000] "GET /product.screen?product id=Holy- 
Gouda JSESSTONID=SD3SL1FF7ADFF8 HTTP/1.1" 404 1661 "http://shop.but- 
tercupgames.com/cart.do?action=view&itemId=manticore" "Mozilla/5.0 
(iPad; U; CPU 0S 3 2 like Mac 0S X; en-us) 


This enables targeted searches that will scan multiple buckets. 


When loading event via gogen, the system load on the indexers is minimal (load average of 3.66, 
1.46, 0.58) as evidenced below: 


top - 20:04:48 up 5 days, 2:44, 2 users, load average:3.66, 1.46, 0.58 
Tasks: 663 total, 2 running, 343 sleeping, @ stopped, 0 zombie 
%Cpu(s): 5.8 us, 2.8 sy, 0.0 ni, 91.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st 


KiB Mem : 
KiB Swap : 


PID USER 
34544 root 
996885 root 
997929 root 
996992 root 
34562 root 


1996796 total, 


PR NI VIRT 


20 
20 
20 
20 
20 


© 1638960 
© 4041096 
© 197496 
0 0 
O 1648644 


1742332 free, 


RES SHR S 


%CPU %MEM 


200768 39876 S 237.0 
27248 10112 S 197.0 
19396 2052 R 65.0 


0 OI 
31184 18184 S 


1.0 
0.7 


0.1 4571:01 
0.0 3:33.97 
0.0 0:01.97 
0.0 0:00.13 
0.0 18:32.50 


TIME+ COMMAND 


splunkd 
gogen 


39491139+total, 10764680 free, 2950712 used, 38119600+buff/cache 
254464 used. 38657564+avail Mem 


splunk-optimize 
kworker/u113:3 


mongod 
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In addition, mc has the ability to guery the cluster for resource usage: 


root@splunk—indexer@1:~# mc admin info server miniocluster 
© 10.88.126.17:9000 

Uptime: 3 days 

Version: 2019-09-25T18:25:51Z 

Storage: Used 4.5 TiB, Free 30 TiB 

Drives: 1/1 0K 


CPU min avg max 
current 0.15% 0.17% 0.19% 
historic 0.02% 1.55% 1286.78% 
MEM usage 


current 1.6 GiB 
historic 961 GiB 


© 10.88.126.23:9000 
Uptime: 4 days 
Version: 2019-09-26T19:42:35Z 
Storage: Used 4.5 TiB, Free 30 TiB 
Drives: 1/1 OK 


CPU min avg max 
current 0.09% 0.12% 0.16% 
historic 0.01% 1.81% 1542.97% 
MEM usage 


current 2.3 GiB 
historic 857 GiB 


From this snippet, it is apparent that various servers are utilizing very low amounts of RAM and CPU. 
In all cases, the systems were never under more than nominal strain for resources. 


SEARCH METRICS 


Using the targeted search term from before, gueries were run against several days of back 
dated data. 


Before the search is run, the Splunk indexer cache was cleared. This represents a "worst case 


scenario”, in which SmartStore is forced to download all buckets before being able to return 
a search result. 


curl -ku admin:changeme "https://localhost:8089/services/admin/- 
cacheman/ evict" -d path=/opt/splunk -d mb=99999999999 


À MinlO for SmartStore | 10 


Confirmation of buckets eviction is performed via the Splunk Monitoring Console: 


Buckets Evicted 


400 


Buckets 
N 
© 
o 





10:00 PM 10:05 PM 10:10 PM 10:15 F 
Mon Sep 30 
2019 


Sep 30, 2019 10:23 PM 
371 


From the search head, the following query is issued: 


time splunk search 'index=minio_index manticore' —maxout 0 |wc -l 


Directly after cache is flushed, the following is seen: 


root@splunk-search-head:~# time splunk search “index=minio index manticore’ 
-maxout @ |wc -l 172800 


real @m39.925s 
user Qm8.821s 
sys 0m2.538s 


By monitoring the output of mc admin trace, a flurry of GET reguests were seen, indicating that 
SmartStore is actively downloading buckets and objects to search. Since the search term occurs 
every minute across a time span of ten days data was loaded, it ensures a large amount of 
buckets must be downloaded. With the combination of Splunk's efficiency in search and com- 
pression, and MinlO's ability to quickly store and retrieve data, the results of the “worst case 
scenario” (searching against terabytes of events across 10 servers) is returned in seconds. 


In a real world scenario, Splunk queries are typically targeted to a limited number of events to 
prevent situations such as detailed above. To simulate this behavior, a unique record, identified 
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by the term “unicorn” was created. After clearing the cache, Splunk needs only to find and 
download the bucket which holds the event: 


root@splunk-search-head:~# time splunk search ‘index=minio_index unicorn’ 
-maxout 0 |wc -l 1 


real 0m13. 052s 
user 0m0 .292s 
sys 0m0. 079s 


After the event has been returned to cache, the search is nearly instantaneous: 


root@splunk-search-head:~# time splunk search “index=minio index unicorn’ 


-maxout 0 |wc -l 1 


real Om0.926s 
user QmQ@.288s 
sys Q@m0. 084s 


CONCLUSION 


MinlO and Splunk SmartStore are exceptionally well suited for each other in deployment scenarios 
that emphasize performance at scale. MinlO's inherent simplicity and scaling properties allow 
for Petabyte plus deployments to be managed via Splunk - providing significant cost savings 
versus AWS while retaining full control and security of the data assets. 
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